A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations
نویسندگان
چکیده
This paper describes a fast HTML Web page detection approach that saves computation time by limiting the similarity computations between two versions of a Web page to nodes having the same HTML tag type, and by hashing the web page in order to provide direct access to node information. This efficient approach is suitable as a client application and for implementing server applications that could serve the needs of users in monitoring modifications to HTML Web pages made over time, and that allow for reporting and visualizing changes and trends in order to gain insight about the significance and types of such changes. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the Web page into an XML-like structure in which a node corresponds to an openclosed HTML tag. Performance and detection reliability results were obtained, and showed speed improvements when compared to the results of a previous approach.
منابع مشابه
A Survey on Web Page Change Detection System Using Different Approaches
Due to limited network and computational resources, it is often difficult to monitor the sources constantly to check for changes and to download changed data items to the copies. The detection of changes across two versions of a page is accomplished by performing similarity computations after transforming the web page into an XMLlike structure in which a node corresponds to an open–close HTML t...
متن کاملتشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کاملHybrid Adaptive Educational Hypermedia Recommender Accommodating User’s Learning Style and Web Page Features
Personalized recommenders have proved to be of use as a solution to reduce the information overload problem. Especially in Adaptive Hypermedia System, a recommender is the main module that delivers suitable learning objects to learners. Recommenders suffer from the cold-start and the sparsity problems. Furthermore, obtaining learner’s preferences is cumbersome. Most studies have only focused...
متن کاملIdentifying Clones in Dynamic Web Sites Using Similarity Thresholds
We propose an approach to automatically detect duplicated pages in dynamic Web sites and on the analysis of both the page structure, implemented by specific sequences of HTML tags, and the displayed content. In addition, for each pair of dynamic pages we also consider the similarity degree of their scripting code. The similarity degree of two pages is computed using different similarity metrics...
متن کاملHTML Page Analysis Based on Visual Cues
In this paper, we present a novel approach to automatically analyzing semantic structure of HTML pages based on detecting visual similarities of content objects on web pages. The approach is developed based on the observation that in most web pages, layout styles of subtitles or records of the same content category are consistent and there are apparent separation boundaries between different ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Data Knowl. Eng.
دوره 66 شماره
صفحات -
تاریخ انتشار 2008